Support non-interpolating quantile definitions #187

nalimilan · 2025-01-22T21:05:16Z

Add a type argument to quantile to support the three remaining types that we didn't support. Some of these are useful in particular because they correspond to actual values from the data and work for types that do not support arithmetic.

Fixes #185.

codecov-commenter · 2025-01-22T21:07:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.84%. Comparing base (0ce3149) to head (2b2595a).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #187      +/-   ##
==========================================
+ Coverage   96.66%   96.84%   +0.17%     
==========================================
  Files           2        2              
  Lines         450      475      +25     
==========================================
+ Hits          435      460      +25     
  Misses         15       15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nalimilan · 2025-01-22T21:09:06Z

src/Statistics.jl

+    if type == 1
+        return v[clamp(ceil(Int, n*p), 1, n)]
+    elseif type == 2
+        i = clamp(ceil(Int, n*p), 1, n)
+        j = clamp(floor(Int, n*p + 1), 1, n)
+        return middle(v[i], v[j])
+    elseif type == 3
+        return v[clamp(round(Int, n*p), 1, n)]


I have used simplified formulas specific to each case, as I find code resulting from using the single general formula from the Hyndman & Fan paper very hard to grasp without any advantage. I hope I didn't introduce mistakes, especially in corner cases. Please suggest things to test if you can find some that are not covered.

src/Statistics.jl

nalimilan · 2025-01-22T21:16:51Z

test/runtests.jl

    @test quantile(v, 1.0, alpha=1.0, beta=1.0) ≈ 21.0

+    # tests against R's quantile with type=1
+    @test quantile(v, 0.0, type=1) === 2


As you can see the type of the result can be different for types 1 and 3 as we keep the original type instead of using whatever type arithmetic operations produce. This makes sense and is even necessary if we want to work for types that don't support arithmetic.

But this means the inferred return type when passing type is a Union of two types. Maybe OK as it's small enough to be optimized out? If not there are probably ways to ensure inference works via combined use of Val(type) and @inline in a wrapper function.

EDIT: The situation is slightly worse for quantile([1, 2], [0.1, 0.2]) as the inferred type is Union{Vector{Float64}, Vector{Int64}, Vector{Real}}.

(Note that when omitting type, the inferred type is concrete as before so at least there's no regression.)

I would add tests for p=0 and p=1 (i.e. integer p), or even p=false and p=true (I undesrand we allow it).

Shouldn't constant propagation be able to handle this? I.e. make quantile_type1(v, p) = quantile(v, q, type=1) inferred?

src/Statistics.jl

bkamins · 2025-02-05T11:24:44Z

src/Statistics.jl

+        m = alpha + p * (one(alpha) - alpha - beta)
+        # Using fma here avoids some rounding errors when aleph is an integer
+        # The use of oftype supresses the promotion caused by alpha and beta
+        aleph = fma(n, p, oftype(p, m))


this will error if p=0 or p=1 and m is a fraction.

Good catch. Though this PR doesn't touch this code, let's handle it separately?

Reproducer:

quantile(1:3, 0, alpha=0.2, beta=0.0)

bkamins · 2025-02-05T11:26:10Z

src/Statistics.jl

+        # Using fma here avoids some rounding errors when aleph is an integer
+        # The use of oftype supresses the promotion caused by alpha and beta
+        aleph = fma(n, p, oftype(p, m))
+        j = clamp(trunc(Int, aleph), 1, n - 1)


is this clamp correct if p=1? It seems to me that then j should be n and later we just need to make sure not to use j+1.

I think so, because γ is 1 in that case and we return (1-γ)*v[j] + γ*v[j + 1] (so we don't care about v[j]).

Add a `type` argument to `quantile` to support the three remaining (non-interpolating) types that we didn't support. Some of these are useful in particular because they correspond to actual values from the data and work for types that do not support arithmetic.

nalimilan · 2025-02-19T08:36:20Z

Gentle bump. :-)

andreasnoack

Looks good to me. Just some minor comments/questions

andreasnoack · 2025-03-09T19:21:37Z

test/runtests.jl

    @test quantile(v, 1.0, alpha=1.0, beta=1.0) ≈ 21.0

+    # tests against R's quantile with type=1
+    @test quantile(v, 0.0, type=1) === 2


Shouldn't constant propagation be able to handle this? I.e. make quantile_type1(v, p) = quantile(v, q, type=1) inferred?

andreasnoack · 2025-03-09T19:29:41Z

src/Statistics.jl

+        alpha = (0.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3]
+        beta  = (1.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3]


I know it is no different from the current implementation but are we certain that these values should be Float64? E.g. if all arguments were Float32 and you wanted to avoid intermediate promotion. For elements with higher precision, I suppose this would also mean that there'd be a loss of precision because 1/3 is not representable as a binary float. Since it only affects type=8, and therefore a fairly negligible corner case, we can maybe just open an issue to have the loss on record instead of dealing with it here.

andreasnoack · 2025-03-09T19:36:54Z

src/Statistics.jl

- Def. 7: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and `PERCENTILE.INC`, Python `'inclusive'`)
- Def. 8: `alpha=1/3`, `beta=1/3`
- Def. 9: `alpha=3/8`, `beta=3/8`
+The keyword argument `type` can be used to choose among the 9 definitions


We could consider calling the keyword definition to follow the naming of the paper. I guess type is what the R functions use.

andreasnoack · 2025-03-09T19:38:33Z

src/Statistics.jl

- Def. 9: `alpha=3/8`, `beta=3/8`
+The keyword argument `type` can be used to choose among the 9 definitions
+in Hyndman and Fan (1996). Alternatively, `alpha` and `beta` allow reproducing
+any of the methods 4-9 defined in this paper. It is not allowed to specify both


Suggested change

any of the methods 4-9 defined in this paper. It is not allowed to specify both

any of the definitions 4-9 of this paper. It is not allowed to specify both

just to avoid also introducing "method" as another synonym on top of "definition" and "type".

nalimilan requested review from andreasnoack and bkamins January 22, 2025 21:05

nalimilan commented Jan 22, 2025

View reviewed changes

src/Statistics.jl Show resolved Hide resolved

nalimilan commented Jan 22, 2025

View reviewed changes

bkamins reviewed Feb 5, 2025

View reviewed changes

src/Statistics.jl Outdated Show resolved Hide resolved

bkamins reviewed Feb 5, 2025

View reviewed changes

src/Statistics.jl Show resolved Hide resolved

bkamins reviewed Feb 5, 2025

View reviewed changes

src/Statistics.jl Show resolved Hide resolved

bkamins reviewed Feb 5, 2025

View reviewed changes

nalimilan added 4 commits February 5, 2025 22:38

More tests

80cc545

Doc fixes

54280d8

Add tests

4b09ad9

nalimilan force-pushed the nl/quantile branch from 02e40c2 to 4b09ad9 Compare February 5, 2025 21:39

Fix test

2b2595a

andreasnoack reviewed Mar 9, 2025

View reviewed changes

nalimilan mentioned this pull request Apr 16, 2025

Choose different quantile cutpoints in cut(x, n) JuliaData/CategoricalArrays.jl#416

Merged

nalimilan mentioned this pull request May 25, 2025

Request: A quantile function for ordinal data only JuliaLang/julia#27367

Closed

		alpha = (0.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3]
		beta = (1.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3]

	any of the methods 4-9 defined in this paper. It is not allowed to specify both
	any of the definitions 4-9 of this paper. It is not allowed to specify both

Support non-interpolating quantile definitions #187

Are you sure you want to change the base?

Support non-interpolating quantile definitions #187

Uh oh!

Conversation

nalimilan commented Jan 22, 2025

Uh oh!

codecov-commenter commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nalimilan Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nalimilan commented Feb 19, 2025

Uh oh!

andreasnoack left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Jan 22, 2025 •

edited

Loading

nalimilan Jan 22, 2025 •

edited

Loading